Introduction

This report investigates B-cell subtypes within pre- and post-menopausal Visium spatial transcriptomics samples. The research focuses on whether double-negative B-cell subtypes—also known as atypical memory cells or age-associated B-cells—could explain differences in patient outcomes and tumor composition between these groups.

Pre-menopausal patients often present with more aggressive tumors and more severe diagnoses compared to their post-menopausal counterparts. This analysis explores how variations in B-cell subtypes may contribute to these differences. The report details the approaches taken and the conclusions drawn from this investigation.

Initialization of the data pipeline in python and R

The overall pipeline is implemented in both Python and R. Initially, the 10x Visium datasets are imported using the Squidpy and Scanpy packages. Spatial transcriptomics data from 10x Visium were processed with these packages. Gene names were made unique to avoid potential duplicates, and mitochondrial genes were identified based on the “MT-” prefix. Quality control (QC) included calculating the percentage of mitochondrial content, filtering out cells with fewer than 600 counts or fewer than 500 genes, and excluding cells with greater than 10% mitochondrial content. Genes expressed in fewer than 10 cells were also removed. After filtering, the data were normalized by scaling gene expression to a total count of 10,000, followed by log-transformation. Cell type annotations were obtained using the pre-trained CellTypist model (Immune_All_High), with majority voting to assign the most likely cell type to each cluster. The resulting AnnData objects contained gene expression, clustering, and cell type annotation information for downstream analysis.

Subsequently, the samples were filtered based on binarized images derived from the recommendations of pathologists who outlined the location of the tumor in each slide. Exceptionally, in one sample (Post-06), a portion of the tumor was discarded due to abnormal morphology. All samples were then converted to Seurat objects and further processed in R. Additional metrics were calculated and incorporated into the Seurat objects for further analysis, including data integration-related metrics with multiple reference datasets and numerous gene signature scores. These and other analyses are described below.

Subsetting based on the input from the pathologists

As mentioned above, we consulted with pathologists who drew outlines of which area of the tissue we should focus on based on tumor morphology. This should reduce some of the variance between samples especially considering that the total area of each slide occupied by tumors differed between samples.

I used the outlines from each sample to generate a black-and-white image mask in ImageJ (panel A). These binary masks were then converted into a binarized vector in Python. For each sample, the spatial coordinates of sequencing locations were scaled and rounded to match the dimensions of the high-resolution image used to generate the mask. Any spatial coordinates that corresponded to a mask value of 0 were discarded. In the panel below, you can see the spatial coordinates before filtering (panel C) and after filtering (panel D). Additionally, panel B shows the assigned cell type identities for each spot after spatial filtering.

Sample quality metrics

XXX as you can see blah blah blah XXX

Verifying the veracity of celltypist assigned b-cell identitys

XXX as you can see blah blah blah XXX

Distribution of celltypist assigned cell types

Examination of gene signatures

XXX as you can see blah blah blah XXX

Correlation of gene signatures with celltypist identities

Comparison of samples with bulk RNA sequencing data

XXX as you can see blah blah blah XXX

Examination of the double negative b-cell gene signature across samples

Below are the images of the double negative b-cell gene signature score:

Examination of the double negative b-cell cosine similarity across samples

Below are the images of the double negative b-cell cosine similarity values: